IBL for Replica Selection in Data-Intensive Grid Applications
نویسندگان
چکیده
In many scientific applications, Grid technologies and infrastructures facilitate distributed resource sharing and coordination in dynamic, heterogeneous multi-institutional environments. Replication of data can help enable high-throughput file transfer and scalable resource storage in scientific Grid applications that involve large data transfers. The selection of a replica can, however, significantly influence the efficiency of a replication scheme. Many current approaches assume that a significant amount of data is available, such as network status information, log files of historical GridFTP file transfers, and CPU status and predictions. We propose a lightweight instance-based learning (IBL) algorithm to allow efficient replica selection with much less required data. We implement the approach and evaluate it in a Grid environment. Our evaluation demonstrates that the IBL approach can be an efficient tool for replica selection when only limited data sources are available.
منابع مشابه
An Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity
The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distance of file transmission and achieving files from ...
متن کاملPerformance Analysis of Applying Replica Selection Technology for Data Grid Environments
The Data Grid enables the sharing, selection, and connection of a wide variety of geographically distributed computational and storage resources for solving large-scale data intensive scientific applications. Such technology efficiently manage and transfer terabytes or even petabytes of data for dataintensive, high-performance computing applications in wide-area, distributed computing environme...
متن کاملGRESS - a Grid Replica Selection Service
Grid technologies and infrastructures facilitate distributed resource sharing and coordination in dynamic, heterogeneous, multi-institutional environments. A replica catalog is a Grid component that keeps replica locations of data objects and provides location transparency to data access. Replica selection is of great importance to data-intensive scientific computing targeted by many data Grid ...
متن کاملPSO-Grid Data Replication Service
Data grid replication is critical for improving the performance of data intensive applications. Most of the used techniques for data replication use Replica Location Services (RLS) to resolve the logical name of files to its physical locations. An example of such service is Giggle, which can be found in the OGSA/Globus architecture. Classical algorithms also need some catalog and optimization s...
متن کاملIncreasing performance in Data grid by a new replica replacement algorithm
Data Grid provides sharing services for very large data around the world. Data replication is one of the most effective approaches to reduce access latency and response time. In addition to the benefits, replication has costs such as storage and bandwidth consumption, especially when storage space is low and limited. Therefore, the data replacement should be done wisely. In this p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004